25 research outputs found

    Framework for data quality in knowledge discovery tasks

    Get PDF
    Actualmente la explosiĂłn de datos es tendencia en el universo digital debido a los avances en las tecnologĂ­as de la informaciĂłn. En este sentido, el descubrimiento de conocimiento y la minerĂ­a de datos han ganado mayor importancia debido a la gran cantidad de datos disponibles. Para un exitoso proceso de descubrimiento de conocimiento, es necesario preparar los datos. Expertos afirman que la fase de preprocesamiento de datos toma entre un 50% a 70% del tiempo de un proceso de descubrimiento de conocimiento. Herramientas software basadas en populares metodologĂ­as para el descubrimiento de conocimiento ofrecen algoritmos para el preprocesamiento de los datos. SegĂșn el cuadrante mĂĄgico de Gartner de 2018 para ciencia de datos y plataformas de aprendizaje automĂĄtico, KNIME, RapidMiner, SAS, Alteryx, y H20.ai son las mejores herramientas para el desucrimiento del conocimiento. Estas herramientas proporcionan diversas tĂ©cnicas que facilitan la evaluaciĂłn del conjunto de datos, sin embargo carecen de un proceso orientado al usuario que permita abordar los problemas en la calidad de datos. AdemÂŽas, la selecciĂłn de las tĂ©cnicas adecuadas para la limpieza de datos es un problema para usuarios inexpertos, ya que estos no tienen claro cuales son los mĂ©todos mĂĄs confiables. De esta forma, la presente tesis doctoral se enfoca en abordar los problemas antes mencionados mediante: (i) Un marco conceptual que ofrezca un proceso guiado para abordar los problemas de calidad en los datos en tareas de descubrimiento de conocimiento, (ii) un sistema de razonamiento basado en casos que recomiende los algoritmos adecuados para la limpieza de datos y (iii) una ontologĂ­a que representa el conocimiento de los problemas de calidad en los datos y los algoritmos de limpieza de datos. Adicionalmente, esta ontologĂ­a contribuye en la representacion formal de los casos y en la fase de adaptaciĂłn, del sistema de razonamiento basado en casos.The creation and consumption of data continue to grow by leaps and bounds. Due to advances in Information and Communication Technologies (ICT), today the data explosion in the digital universe is a new trend. The Knowledge Discovery in Databases (KDD) gain importance due the abundance of data. For a successful process of knowledge discovery is necessary to make a data treatment. The experts affirm that preprocessing phase take the 50% to 70% of the total time of knowledge discovery process. Software tools based on Knowledge Discovery Methodologies offers algorithms for data preprocessing. According to Gartner 2018 Magic Quadrant for Data Science and Machine Learning Platforms, KNIME, RapidMiner, SAS, Alteryx and H20.ai are the leader tools for knowledge discovery. These software tools provide different techniques and they facilitate the evaluation of data analysis, however, these software tools lack any kind of guidance as to which techniques can or should be used in which contexts. Consequently, the use of suitable data cleaning techniques is a headache for inexpert users. They have no idea which methods can be confidently used and often resort to trial and error. This thesis presents three contributions to address the mentioned problems: (i) A conceptual framework to provide the user a guidance to address data quality issues in knowledge discovery tasks, (ii) a Case-based reasoning system to recommend the suitable algorithms for data cleaning, and (iii) an Ontology that represent the knowledge in data quality issues and data cleaning methods. Also, this ontology supports the case-based reasoning system for case representation and reuse phase.Programa Oficial de Doctorado en Ciencia y TecnologĂ­a InformĂĄticaPresidente: Fernando FernĂĄndez Rebollo.- Secretario: Gustavo Adolfo RamĂ­rez.- Vocal: Juan Pedro Caraça-Valente HernĂĄnde

    A case-based reasoning system for recommendation of data cleaning algorithms in classification and regression tasks

    Get PDF
    Recently, advances in Information Technologies (social networks, mobile applications, Internet of Things, etc.) generate a deluge of digital data; but to convert these data into useful information for business decisions is a growing challenge. Exploiting the massive amount of data through knowledge discovery (KD) process includes identifying valid, novel, potentially useful and understandable patterns from a huge volume of data. However, to prepare the data is a non-trivial refinement task that requires technical expertise in methods and algorithms for data cleaning. Consequently, the use of a suitable data analysis technique is a headache for inexpert users. To address these problems, we propose a case-based reasoning system (CBR) to recommend data cleaning algorithms for classification and regression tasks. In our approach, we represent the problem space by the meta-features of the dataset, its attributes, and the target variable. The solution space contains the algorithms of data cleaning used for each dataset. We represent the cases through a Data Cleaning Ontology. The case retrieval mechanism is composed of a filter and similarity phases. In the first phase, we defined two filter approaches based on clustering and quartile analysis. These filters retrieve a reduced number of relevant cases. The second phase computes a ranking of the retrieved cases by filter approaches, and it scores a similarity between a new case and the retrieved cases. The retrieval mechanism proposed was evaluated through a set of judges. The panel of judges scores the similarity between a query case against all cases of the case-base (ground truth). The results of the retrieval mechanism reach an average precision on judges ranking of 94.5% in top 3, for top 7 84.55%, while in top 10 78.35%.The authors are grateful to the research groups: Control Learning Systems Optimization Group (CAOS) of the Carlos III University of Madrid and Telematics Engineering Group (GIT) of the University of Cauca for the technical support. In addition, the authors are grateful to COLCIENCIAS for PhD scholarship granted to PhD. David Camilo Corrales. This work has been also supported by: Project Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrĂ­colas del departamento del Cauca soportado en entornos de IoT financed by Convocatoria 04C-2018 Banco de Proyectos Conjuntos UEES-Sostenibilidad of Project Red de formaciĂłn de talento humano para la innovaciĂłn social y productiva en el Departamento del Cauca InnovAcciĂłn Cauca, ID-3848. The Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    Centrality evolution of the charged-particle pseudorapidity density over a broad pseudorapidity range in Pb-Pb collisions at root s(NN)=2.76TeV

    Get PDF
    Peer reviewe

    DivulgaciĂłn CientĂ­fica No. 5

    No full text
    En los países iberoamericanos, así como en otras regiones del mundo, existe una discusión permanente sobre el deterioro el medioambiente. Las instituciones de educación superior contamos con una vasta producción bibliogråfica que queremos compartir con todos los interesados y con la sociedad. Por esa razón, la Asociación de Editoriales Universitarias de América Latina y el Caribe (Eulac)* dando continuidad a nuestro proyecto de editoriales universitarias Enlazadas, presentamos este año la propuesta Enlazadas por el medioambiente, que abordarå esta inquietud desde diversos ejes en el marco de las ferias del libro mås importantes del mundo.In Ibero-American countries, as well as in other regions of the world, there is a permanent discussion about the deterioration of the environment. Higher education institutions have a vast bibliographic production that we want to share with all interested parties and with society. For this reason, the Association of University Presses of Latin America and the Caribbean (Eulac)*, giving continuity to our project of Enlazadas university presses, we present this year the proposal Enlazadas for the environment, which will address this concern from various axes within the framework of the most important book fairs in the world

    Effect of the COVID-19 pandemic on surgery for indeterminate thyroid nodules (THYCOVID): a retrospective, international, multicentre, cross-sectional study

    No full text
    Background: Since its outbreak in early 2020, the COVID-19 pandemic has diverted resources from non-urgent and elective procedures, leading to diagnosis and treatment delays, with an increased number of neoplasms at advanced stages worldwide. The aims of this study were to quantify the reduction in surgical activity for indeterminate thyroid nodules during the COVID-19 pandemic; and to evaluate whether delays in surgery led to an increased occurrence of aggressive tumours. Methods: In this retrospective, international, cross-sectional study, centres were invited to participate in June 22, 2022; each centre joining the study was asked to provide data from medical records on all surgical thyroidectomies consecutively performed from Jan 1, 2019, to Dec 31, 2021. Patients with indeterminate thyroid nodules were divided into three groups according to when they underwent surgery: from Jan 1, 2019, to Feb 29, 2020 (global prepandemic phase), from March 1, 2020, to May 31, 2021 (pandemic escalation phase), and from June 1 to Dec 31, 2021 (pandemic decrease phase). The main outcomes were, for each phase, the number of surgeries for indeterminate thyroid nodules, and in patients with a postoperative diagnosis of thyroid cancers, the occurrence of tumours larger than 10 mm, extrathyroidal extension, lymph node metastases, vascular invasion, distant metastases, and tumours at high risk of structural disease recurrence. Univariate analysis was used to compare the probability of aggressive thyroid features between the first and third study phases. The study was registered on ClinicalTrials.gov, NCT05178186. Findings: Data from 157 centres (n=49 countries) on 87 467 patients who underwent surgery for benign and malignant thyroid disease were collected, of whom 22 974 patients (18 052 [78·6%] female patients and 4922 [21·4%] male patients) received surgery for indeterminate thyroid nodules. We observed a significant reduction in surgery for indeterminate thyroid nodules during the pandemic escalation phase (median monthly surgeries per centre, 1·4 [IQR 0·6-3·4]) compared with the prepandemic phase (2·0 [0·9-3·7]; p<0·0001) and pandemic decrease phase (2·3 [1·0-5·0]; p<0·0001). Compared with the prepandemic phase, in the pandemic decrease phase we observed an increased occurrence of thyroid tumours larger than 10 mm (2554 [69·0%] of 3704 vs 1515 [71·5%] of 2119; OR 1·1 [95% CI 1·0-1·3]; p=0·042), lymph node metastases (343 [9·3%] vs 264 [12·5%]; OR 1·4 [1·2-1·7]; p=0·0001), and tumours at high risk of structural disease recurrence (203 [5·7%] of 3584 vs 155 [7·7%] of 2006; OR 1·4 [1·1-1·7]; p=0·0039). Interpretation: Our study suggests that the reduction in surgical activity for indeterminate thyroid nodules during the COVID-19 pandemic period could have led to an increased occurrence of aggressive thyroid tumours. However, other compelling hypotheses, including increased selection of patients with aggressive malignancies during this period, should be considered. We suggest that surgery for indeterminate thyroid nodules should no longer be postponed even in future instances of pandemic escalation. Funding: None

    Coherent J/ψ photoproduction in ultra-peripheral Pb–Pb collisions at √sNN=2.76 TeV

    No full text
    The ALICE Collaboration has made the first measurement at the LHC of J/ψ photoproduction in ultra-peripheral Pb–Pb collisions at sNN=2.76 TeV. The J/ψ is identified via its dimuon decay in the forward rapidity region with the muon spectrometer for events where the hadronic activity is required to be minimal. The analysis is based on an event sample corresponding to an integrated luminosity of about 55 ÎŒb−1. The cross section for coherent J/ψ production in the rapidity interval −3.6<y<−2.6 is measured to be dσJ/ψcoh/dy=1.00±0.18(stat)−0.26+0.24(syst) mb. The result is compared to theoretical models for coherent J/ψ production and found to be in good agreement with those models which include nuclear gluon shadowing

    Inclusive J/ψ production in pp collisions at √s=2.76 TeV

    No full text
    The ALICE Collaboration has measured inclusive J/ψ production in pp collisions at a center-of-mass energy √s=2.76 TeV at the LHC. The results presented in this Letter refer to the rapidity ranges |y|<0.9 and 2.5<y<4 and have been obtained by measuring the electron and muon pair decay channels, respectively. The integrated luminosities for the two channels are Linte=1.1 nb−1 and LintÎŒ=19.9 nb−1, and the corresponding signal statistics are NJ/ψe+e−=59±14 and NJ/ψΌ+Ό−=1364±53. We present dσJ/ψ/dy for the two rapidity regions under study and, for the forward-y range, d2σJ/ψ/dydpt in the transverse momentum domain 0<pt<8 GeV/c. The results are compared with previously published results at s=7 TeV and with theoretical calculations

    Multi-strange baryon production in pp collisions at √s=7 TeV with ALICE

    No full text
    A measurement of the multi-strange Ξ− and Ω− baryons and their antiparticles by the ALICE experiment at the CERN Large Hadron Collider (LHC) is presented for inelastic proton–proton collisions at a centre-of-mass energy of 7 TeV. The transverse momentum (pT) distributions were studied at mid-rapidity (|y|6.0 GeV/c. We also illustrate the difference between the experimental data and model by comparing the corresponding ratios of (Ω−+Ω¯+)/(Ξ−+Ξ¯+) as a function of transverse mass

    Neutral pion and η meson production in proton–proton collisions at √s=0.9 TeV and s=√7 TeV

    No full text
    he first measurements of the invariant differential cross sections of inclusive π0 and η meson production at mid-rapidity in proton–proton collisions at s=0.9 TeV and s=7 TeV are reported. The π0 measurement covers the ranges 0.4<pT<7 GeV/c and 0.3<pT<25 GeV/c for these two energies, respectively. The production of η mesons was measured at s=√7 TeV in the range 0.4<pT<15 GeV/c. Next-to-Leading Order perturbative QCD calculations, which are consistent with the π0 spectrum at s=0.9 TeV, overestimate those of π0 and η mesons at s=√7 TeV, but agree with the measured η/π0 ratio at s=√7 TeV
    corecore